

CLAIMS:

What is claimed is:

1. A computer implemented method of adding each of a first plurality of elements of a first packed data together to produce a first result comprising the steps of:
  - 1 producing a first plurality of portions of a plurality of partial products using a first plurality of partial product selectors in a multiplier, each of the first plurality of portions of the plurality of partial products being zero; and
  - 2 inserting each of the first plurality of elements into one of a second plurality of portions of the plurality of partial products using a second portion of a plurality of partial product selectors, each of the second plurality of portions of the plurality of partial products being aligned; and
  - 3 adding each of the first plurality of elements together to produce the first result including a field having the sum of the first plurality of elements.
2. The computer implemented method of Claim 1 further comprising the step of shifting the first result to produce a second result having the field aligned with the least significant bit of the second result.
3. The computer implemented method of Claim 1 wherein the first plurality of elements each consist of eight elements, each of the first plurality of elements being an unsigned byte.
4. The computer implemented method of Claim 1 wherein the multiplier has zero as one operand.

1       5. The computer implemented method of Claim 1 further comprising the  
2 steps of:

3           producing a fourth packed data having a fourth plurality of elements and a  
4 plurality of sign bits, each of the fourth plurality of elements and the plurality of sign  
5 bits being computed by subtracting one of a second plurality of elements of a second  
6 packed data from a corresponding one of a third plurality of elements of a third packed  
7 data; and

8           producing the first packed data, each of the first plurality of elements being  
9 computed by subtracting one of the fourth plurality of elements from the corresponding  
10 one of an at least one element, if the corresponding one of the plurality of sign bits is in  
11 a first state; and adding one of the fourth plurality of elements from the corresponding  
12 one of the at least one element, if the corresponding one of the plurality of sign bits is in  
13 a second state.

1       6. The computer implemented method of Claim 1 further comprising the  
2 steps of:

3           producing a fourth packed data having a fourth plurality of elements, each of the  
4 fourth plurality of elements being maximum value of one of a second plurality of  
5 elements of a second packed data and one of a third plurality of elements of a third  
6 packed data;

7           producing a fifth packed data having a fifth plurality of elements, each of the  
8 fifth plurality of elements being minimum value of one of the second plurality of  
9 elements and one of the third plurality of elements; and

10          producing the first packed data, each of the first plurality of elements being one  
11 of the fifth plurality of elements minus corresponding one of the fourth plurality of  
12 elements.

1        7. The computer implemented method of Claim 1 further comprising the  
2 steps of:

3            producing a fourth packed data having a fourth plurality of elements, each of the  
4 fourth plurality of elements being one of a second plurality of elements of a second  
5 packed data minus the corresponding one of a third plurality of elements of a third  
6 packed data saturated at zero;

7            producing a fifth packed data having a fifth plurality of elements, each of the  
8 fifth plurality of elements being one of the third plurality of elements minus the  
9 corresponding one of the second plurality of elements saturated at zero;

10          producing the first packed data, each of the first plurality of elements being a  
11 bitwise logical OR of one of the fourth plurality of elements and the corresponding one  
12 of the fifth plurality of elements.

1        8. An apparatus for performing a horizontal add of a first packed data  
2 having a first plurality of elements comprising:  
3            a multiplier operable to produce a first plurality of portions of a plurality of  
4 partial products using a first plurality of partial product selectors, each of the first  
5 plurality of portions of the plurality of partial products being zero and inserting each of  
6 the first plurality of elements into one of a second plurality of portions of the plurality  
7 of partial products using a second plurality of partial product selectors, each of the  
8 second plurality of portions of the plurality of partial products being aligned; and  
9            an adder tree configured to receive the plurality of partial products and produce a  
10 first result including a field having the sum of the first plurality of elements.

1        9. The apparatus of Claim 8 wherein the first plurality of partial product  
2 selectors are coupled to receive a portion of an operand, the operand being zero.

- 1        10. The apparatus of Claim 9 wherein the operand has a single element.
- 1        11. The apparatus of Claim 9 wherein the operand is a packed data.
- 1        12. The apparatus of Claim 8 wherein the adder tree comprises:  
2              a plurality of carry save adders operable to receive the plurality of partial  
3              products and produce a set of carry and sum signals; and  
4              a carry lookahead adder operable to receive the set of carry and sum signals and  
5              generate the first result, the first result being the sum of the set of carry and sum signals.
- 1        13. The apparatus of Claim 8 further comprising a shifter coupled to receive  
2              the first result and shift the first result to produce a second result having the field  
3              aligned with the least significant bit of the second result.
- 1        14. The apparatus of Claim 8 further comprising:  
2              a first circuit configured to receive a second packed data having a second  
3              plurality of elements and a third packed data having a third plurality of elements to  
4              produce a fourth packed data having a fourth plurality of elements, each of the fourth  
5              plurality of elements being maximum value of one of the second plurality of elements  
6              and one of the third plurality of elements;  
7              a second circuit configured to receive the second packed data and the third  
8              packed data to produce a fifth packed data having a fifth plurality of elements, each of  
9              the fifth plurality of elements being minimum value of one of the second plurality of  
10          elements and one of the third plurality of elements; and  
11              a third circuit configured to receive the fourth packed data and the fifth packed  
12          data to produce the first packed data, each of the first plurality of elements being one of

13 the fifth plurality of elements minus corresponding one of the fourth plurality of  
14 elements.

1       15. The apparatus of Claim 8 further comprising:

2           a first circuit configured to receive a second packed data having a second  
3 plurality of elements and a third packed data having a third plurality of elements to  
4 produce a fourth packed data having a fourth plurality of elements, each of the fourth  
5 plurality of elements being one of the second plurality of elements minus the  
6 corresponding one of the third plurality of elements saturated at zero;

7           a second circuit configured to receive the second packed data and the third  
8 packed data to produce a fifth packed data having a fifth plurality of elements, each of  
9 the fifth plurality of elements being one of the third plurality of elements minus the  
10 corresponding one of the second plurality of elements saturated at zero; and

11           a third circuit configured to receive the fourth packed data and the fifth packed  
12 data to produce the first packed data, each of the first plurality of elements being a  
13 bitwise logical OR of one of the fourth plurality of elements and the corresponding one  
14 of the fifth plurality of elements.

*Weld  
AH*